Traditional approaches to analysing housing indexes, such as hedonic or repeat-sales models, can introduce selection bias or violate underlying assumptions. Moreover, they fail to capture key regional, social, or macroeconomic factors that play a significant role in driving housing price movements.
In the recent research Sijp et al. (2025) propose a new approach that applies Principal Component Analysis (PCA) to regional SA4 level housing indexes, enabling the extraction of key drivers of price movements and offering a deeper understanding of the forces shaping housing markets.
The PCA results show that the first three principal components capture much of the behaviour of local price indexes, as a linear combination of their time series explains most of the variance in the housing index. Nevertheless, the PCA-based linear model lacks robustness across different time windows, and its coefficient estimates must be derived directly from the PCA procedure.
Our exploratory research extends the PCA approach by addressing these limitations through a factor linear model, which uses time series data directly, with independent variables serving as proxies for the PCA-derived principal components.
The national trend (U) exhibits a close correspondence with PC1 (market), while the Perth–Sydney spread (δPS) aligns with PC2 (mining). Given that these principal components account for the majority of variance in the index, our preliminary analysis focuses on U and δPS, with subsequent extensions planned. (WIP - might add lifestyle)
We examined whether PC2 is better represented by the Brisbane–Sydney or Perth–Sydney spread by comparing the shapes of the standardised series and testing their correlations with PC2. The Perth–Sydney spread aligns more closely with PC2: The scatterplot shows a distribution that falls more evenly along the regression line, and its correlation with PC2 is slightly higher (98% versus 96% for the Brisbane–Sydney spread). While the difference is minor and not statistically significant, the Perth–Sydney spread is the stronger candidate as a proxy for PC2.
The exploratory analysis utilises visualisations to characterise the dynamics of the national trend (U) and the Perth–Sydney spread (δPS), thereby clarifying their underlying properties.
Time-series plots:
U rises steadily from 2003 to 2012, transitions into a slower growth phase until 2019, then undergoes a sharp regime shift during 2020–2022 (the COVID period) followed by persistently elevated and volatile levels, with growth rates remaining time-varying throughout.
δPS displays a prolonged, uneven rise from 2003 to 2012, followed by a sharp regime shift, then COVID-related fluctuations, with the level adjusting and showing time-varying patterns.
Seasonal plots:
U has apparent within-year slopes being dominated by the strong upward trend; no systematic month-of-year effects.
δPS shows little evidence of month-to-month seasonality, with no particular month standing out, though its overall level varies between years.
Seasonal sub-series plot:
Each monthly panel of U shows a steady upward rise with mean value across months remaining similar, reinforcing that the series is driven more by trend than by seasonality.
The δPS seasonal sub-series plot reinforces the lack of a dominant seasonal pattern, as the monthly profiles are similar in shape and the corresponding means remain relatively flat and close to zero.
With the factors now established, we proceed to fit time series forecasting models to each series individually. As a first step, we examine the series in more detail. STL decomposition plots allow us to closely inspect the trend, seasonality, and remainder components, which helps determine the most suitable forecasting models for the series’ characteristics.
Stationary tests (ADF and KPSS) indicate that both the national index (U) and the Perth–Sydney spread (PS) are non-stationary, so differencing is required. However, based on economic intuition that the PS series may be mean-reverting, we applied the Zivot–Andrews (ur.za) test. This confirmed that PS could be stationary with several structural break.
The dataset was partitioned into training and testing subsets, with one year of observations allocated to the testing set. Given that the data are recorded at a monthly frequency and span only 25 years, this allocation represents an optimal balance between ensuring sufficient data for model training while preserving an adequate sample for evaluation.
The STL decomposition of the national trend U reveals a smooth and persistent upward trajectory in the trend component, while the seasonal component is of relatively minor magnitude, however the annual seasonality is strong and repeating. To stabilise variation in the seasonal component, a Box–Cox transformation was applied using the Guerrero method, which produced a λ value of 0.9. Given its proximity to one, the transformation had a negligible effect on the modelling outcome. The remainder component displays mostly small noise, with some bigger shocks.
The modelling strategy was guided by exploratory data analysis and
the STL decomposition, which suggested a random walk with drift,
ARIMA(U ∼ pdq(0,1,0)), as an appropriate baseline
specification.
To refine the model, we subsequently evaluated neighbouring specifications with alternative autoregressive and moving average terms, using AICc values and residual diagnostics as selection criteria.
In the modelling process, we evaluated a range of specifications
informed both by diagnostic checks and prior knowledge. Inspection of
the residuals suggested some semi-annual variation, and Fourier terms
with K = 2 were added to account for potential
higher-frequency seasonality. We also investigated seasonal extensions
with P = 1, alongside piecewise specifications with knots
aligned to major economic events: the Global Financial Crisis (2008),
the end of the mining investment boom (2012), and the onset of the
COVID-19 pandemic (2020). These variations were assessed with respect to
residual behaviour, statistical significance, and their ability to
capture shocks apparent in the remainder component.
Among the fitted models, two candidates emerged as particularly strong. The first, an ARIMA(2,1,1) with Fourier terms (K = 2) and drift. A second competitive specification combined Fourier terms (K = 2) with intervention dummies (step and pulse) and an ARIMA(2,1,1) structure.
## # A tibble: 8 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 step 0.00000672 1338. -2650. -2649. -2602. <cpl [26]> <cpl [1]>
## 2 arima211 0.00000700 1331. -2640. -2639. -2599. <cpl [26]> <cpl [1]>
## 3 arima110 0.00000712 1327. -2637. -2636. -2604. <cpl [25]> <cpl [0]>
## 4 arima111 0.00000715 1327. -2635. -2634. -2598. <cpl [25]> <cpl [1]>
## 5 sarima211 0.00000785 1312. -2611. -2611. -2589. <cpl [14]> <cpl [1]>
## 6 autoarima 0.00000774 1308. -2609. -2609. -2598. <cpl [24]> <cpl [0]>
## 7 piece 0.00000791 1312. -2606. -2606. -2573. <cpl [1]> <cpl [25]>
## 8 rw_drift 0.0000416 1066. -2127. -2127. -2116. <cpl [0]> <cpl [12]>
Although both models fall short of strictly meeting the residual whiteness criterion, particularly since the degrees of freedom were not clearly defined in the test results, we use them here mainly for relative comparison of performance. Visual inspection of the residual plots suggests that the residuals are approximately normally distributed, with only a small share of autocorrelation coefficients exceeding the significance bounds in the ACF plot. As a general rule of thumb, if fewer than about 5% of the lags lie outside these bounds, the residuals can reasonably be treated as white noise.
## # A tibble: 8 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 arima211 18.4 0.103
## 2 step 20.4 0.0595
## 3 sarima211 21.7 0.0411
## 4 autoarima 23.0 0.0279
## 5 arima110 24.7 0.0165
## 6 arima111 24.8 0.0157
## 7 piece 28.9 0.00411
## 8 rw_drift 885. 0
The mining trend (PS) oscillates around zero and appears mean-reverting with no evidence ofpermanent trend. Seasonality is present but relatively stable across years, while longer multi-year swings suggest the need for more flexible seasonal modelling. The residuals still display structure, with a clear AR(1) cut-off rather than pure white noise.
The mining trend (PS) captures the dynamics of the resource boom, which we interpret as a one-off structural break rather than a permanent driver. Prices surged sharply during the 2000s mining boom, collapsed around 2012, and have remained stagnant since. If this raw factor is carried forward into forecasts, the boom–bust cycle dominates and produces unstable predictions with excessively wide intervals.
To address this, we treat the boom as a temporary structural shock. We introduce a dummy variable (set to 1 during the boom/decline period and 0 otherwise) when modelling PS, and then forecast only the residual stationary component. This approach effectively assumes that another boom of comparable scale is unlikely.
The boom and decline phases were identified using the
breakpoints function from the strucchange
package, which detected three structural breaks in the series. The
segment containing the global peak was classified as the “boom” period,
followed by the “decline” period. These periods were then encoded into
dummy variables for use in the modelling process.
To model the series with the dummy variables, essentially we fit an ARIMA model to the residuals after accounting for the boom and decline periods. The final selected model was an ARIMA(2,0,1) with the boom and decline dummies included as regressors.
## Series: PS
## Model: LM w/ ARIMA(2,0,1) errors
##
## Coefficients:
## ar1 ar2 ma1 boom decline
## 1.8741 -0.8763 -0.3764 -0.0081 0.0014
## s.e. 0.0389 0.0388 0.0829 0.0041 0.0041
##
## sigma^2 estimated as 4.359e-05: log likelihood=1061.23
## AIC=-2110.45 AICc=-2110.16 BIC=-2088.33
For the forecasting stage, we use the best-performing models (ARIMA(2,1,1) with Fourier terms (K = 2) for U and ARIMA(2,0,1) for PS) identified earlier to project values 10 years ahead (120 months). We generate forecasts with prediction intervals at both 80% and 95%. These intervals show the range in which future values are likely to fall, giving us a measure of uncertainty, an 80% interval means there is an 8-in-10 chance that the true value will lie within that range.
For evaluation, we set aside one year of data (2023) as a test set. Although a typical split is about 20% of the dataset, our series only spans 25 years, so using 20% would cut off too much valuable information for forecasting. A one-year test period strikes a better balance.
The forecasts indicate that the national trend is expected to rise, with an 80% prediction interval width of 0.6845, reflecting higher uncertainty. In contrast, the mining trend (PS) shows a flatter path, with an 80% interval width of 0.4538. This narrower range reflects greater stability, largely because we excluded the one-off boom and decline period, which would otherwise make the series more volatile.
The test set results show that the chosen model for U performs consistently with very low error (RMSE = 0.0076, negligible relative to the scale of U). For PS, the model incorporating boom/decline dummies produces out-of-sample forecasts that are on average about 7% off, which is acceptable given the series’ inherently difficult-to-predict behaviour (mining cycle is more volatile than the national market). While the ACF1 values indicate some remaining autocorrelation in the residuals, overall both models demonstrate solid performance.
## # A tibble: 2 × 6
## series .model .type RMSE MAE MAPE
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 U (national) arima211 Test 0.00756 0.00487 0.261
## 2 PS (mining) arima201 Test 0.0116 0.00807 6.83
We now turn to evaluating how well the factor model captures the variation in individual major city and SA4 indexes. R² measures the share of variation in each major city or SA4 housing index that our factor model explains.
For major cities and rest-of-state: Most city and regions have an R² value above 0.98. The extremes highlight clear spatial patterns:
Top cities and rest-of-state: Large east-coast capitals and their surrounding belts.
Bottom cities and rest-of-state: Smaller or resource-dependent markets.
| Rank | Region | R² | Rank | Region | R² |
|---|---|---|---|---|---|
| 1 | GREATER MELBOURNE | 0.9985 | 15 | REST OF WA | 0.9697 |
| 2 | REST OF NSW | 0.9985 | 14 | GREATER PERTH | 0.9735 |
| 3 | REST OF QLD | 0.9983 | 13 | REST OF NT | 0.9800 |
| 4 | GREATER SYDNEY | 0.9981 | 12 | GREATER HOBART | 0.9857 |
| 5 | GREATER BRISBANE | 0.9979 | 11 | GREATER DARWIN | 0.9860 |
For SA4 regions: The SA4 regions shares the same stories. R² is tightly concentrated near 1.0 with the majority of SA4s exceed 0.98, with only a small left tail (a few around 0.92 – 0.95). The top and bottom lists identify the extremes and show recurring patterns across those regions:
| Rank | Region | R² | Rank | Region | R² |
|---|---|---|---|---|---|
| 1 | MELBOURNE - SOUTH EAST | 0.9982 | 87 | WESTERN AUSTRALIA - OUTBACK (NORTH) | 0.9184 |
| 2 | MELBOURNE - OUTER EAST | 0.9979 | 86 | MANDURAH | 0.9460 |
| 3 | BRISBANE - WEST | 0.9975 | 85 | WESTERN AUSTRALIA - OUTBACK (SOUTH) | 0.9601 |
| 4 | BRISBANE - SOUTH | 0.9974 | 84 | BUNBURY | 0.9606 |
| 5 | BRISBANE - NORTH | 0.9974 | 83 | WESTERN AUSTRALIA - WHEAT BELT | 0.9635 |
| 6 | ADELAIDE - WEST | 0.9972 | 82 | MACKAY - ISAAC - WHITSUNDAY | 0.9678 |
| 7 | MELBOURNE - NORTH EAST | 0.9972 | 81 | PERTH - SOUTH EAST | 0.9691 |
| 8 | ADELAIDE - CENTRAL AND HILLS | 0.9972 | 80 | PERTH - NORTH EAST | 0.9695 |
| 9 | SYDNEY - BAULKHAM HILLS AND HAWKESBURY | 0.9970 | 79 | PERTH - SOUTH WEST | 0.9732 |
| 10 | WIDE BAY | 0.9970 | 78 | CENTRAL QUEENSLAND | 0.9743 |
Both SA4 regions and major cities display highly concentrated R² distributions with major cities showing less variation.
The choropleth reinforces the findings from the summary table, as we can
see that the east-coast capitals and coastal belts from Melbourne
through Sydney into the South-East of Queensland show the strongest fit,
with most regional NSW/VIC also high. While fit weakens inland and to
the west, notably across Western Australia and along Queensland’s mining
belts; The dark purple areas mark the lowest R² and are typically remote
and resource-exposed regions with possible boom–bust timing and sparse
transactions that our three factors don’t fully capture.
Hence, we’ll prioritise Western Australia and Queensland’s mining-exposed regions for residual diagnostics, where we expect autocorrelation and regime shifts for trial enhancements.
At this stage, the model does not incorporate any time dynamics in the residuals. While it explains a substantial share of the variance for both cities and regions, the trending and autocorrelated nature of housing indexes means that the residuals themselves display extremely strong positive autocorrelation. This is evident in the Durbin–Watson statistics, which are close to zero across all regions: Major cities cluster around 0.02 to 0.05, while SA4 regions have a slightly broader range, extending up to 0.2, but still remain far below the benchmark value of 2. This pattern underscores the limitations of using a static regression framework without accounting for time-series dynamics.
## # A tibble: 2 × 6
## region_level dw_min dw_max dw_mean dw_median range
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 major_city 0.0146 0.104 0.0338 0.0282 0.0891
## 2 sa4_name 0.0211 0.201 0.0680 0.0598 0.180
While R² and Durbin–Watson assess overall fit and residual behaviour, they do not show how regions load onto each factor. Examining coefficient patterns adds this context, revealing whether markets move uniformly with the national trend or diverge due to mining influences.
The distribution of market coefficients is tightly centred around 1.0 with relatively little variation. This indicates that most regions respond in a similar way to the national housing trend. In contrast, the mining coefficients are far more dispersed with a long right tail and several extreme outliers, suggesting that exposure to mining-related dynamics differs substantially across regions.
The scatterplot analysis reveals a clear negative relationship between the market and mining coefficients. Regions with a strong loading on the national market factor tend to exhibit weaker or even negative loadings on the mining factor and vice versa.
(WIP - examine numbers since we switched to PS; Add more methodologies with the forecast model should be built at this stage)
To justify including δPS alongside the national trend U in the factor model, we need to show that δPS is more bounded over time, meaning it stays within a narrower range and has lower long-term volatility by comparing δPS and U using overall statistics and rolling-window measures.
We next applied rolling windows of 12 and 24 months to the same statistics. Rather than using the entire time span at once, this approach slides a fixed-length window along the series, calculating the variability measures at each step using only the most recent 12 or 24 months of data. This method highlights how the volatility and dispersion of δPS and U evolve over time.
We use 12-month windows to smooth short-term fluctuations and capture typical annual housing cycles, helping assess if volatility is bounded within a year. The 24-month windows extend the view, showing whether this boundedness holds beyond yearly patterns and reflecting medium-term events.
To keep the report concise, we highlight two plots: the 12-month rolling standard deviation for short-term volatility and the 24-month rolling range for longer-term fluctuations.
Finally, we calculated the proportion of time (by 12 and 24 months respectively) in which δPS had a lower spread than U, giving a simple measure of how often it was more bounded.
The global ratios are all well below 1, meaning δPS has consistently lower spread than U across all four measures. This supports δPS being more bounded overall.
While the rolling-window plots and statistics give a more detailed picture:
The 12-month rolling standard deviation plot shows that δPS is often less volatile than U, but not consistently (occasional spikes narrow the gap).
The 24-month rolling range plot shows δPS has wider swings at times, but also periods where it is more stable than U, highlighting its tendency to avoid large swings over longer periods.
δPS has a lower spread than U in around 38.6% of the time for 12-month windows and 41.0% for 24-month windows, indicating it is more bounded (however less than half the time) in rolling windows.